3. Designing and Configuring DAGs
When deploying a CCR
environment in Exchange 2007, the sizing was straightforward—the
databases were running on one node or the other. In Exchange 2010,
which offers you the ability to have 16 members with up to 1,600
databases, sizing and designing the layout is far more complex. The
obvious rule is that the more servers you have in a DAG the more
options you have for laying out your database
copies efficiently and resiliently. Consider the implications of a
three-copy, six-server DAG versus two DAGs with three servers and three
copies of each database. More servers in a single DAG give you more
flexibility in creating copies and to balancing load. To illustrate, if
a single server fails with three active databases in a three-member
DAG, the two remaining servers need to service the load from the first
server, as shown in Figure 10.
As compared to two
3-member DAGs, a 6-member DAG can more effectively spread the results
of failure across multiple servers as well as to sustain more member
failures.
In Figure 10
the DAG was designed to sustain a single-node failure; if more than one
member was down at least two databases would be offline. Simply adding
a member to a DAG does not automatically enable it to sustain multiple
failures, as Figure 11 shows. Here, servers are configured to mirror each other in a four-member DAG. If either A and
B or C and D fail, a large number of databases will be unavailable.
This configuration provides no better member redundancy than having two
2-member DAGs.
You should design the
databases copies with the worst-case failure needed to meet your
agreed-upon SLAs. The following two rules apply for redundancy:
One-member failure requires two or more high-availability copies, two or more servers, and a witness server.
Two-member failure requires three or more high-availability copies, four or more servers, and a witness server.
Rather than mirroring database copies on two servers it is better to stripe copies across the members or create copies randomly across the DAG to reduce the likelihood of a low number of failures causing outages for databases.
When determining the copy
design plan for the worst case, ensure that the members can handle all
of the hosted database copies becoming active. If you plan on
oversubscribing the members, you can set a maximum number of
simultaneous active databases on each member to ensure that more copies
than the server can handle do not come online by using the Set-MailboxServer cmdlet with the -MaximumActiveDatabase
parameter. When the Mailbox server has reached the maximum, no
additional database mounts will be successful. If the Active Manager
attempts to mount a database on the server the mount will fail and
Active Manager will attempt to mount the database copy on another
member if one is available. Also, as usage profiles change over time it
is important to periodically evaluate the appropriate level of
oversubscription and whether the number of active database copies
should be modified to accommodate for hardware and usage changes.
Over the course of time, when
maintenance is performed active mailbox databases may end up active on
servers that they were not intended for. As part of routine maintenance
activities remember to activate the database copies across the DAG. You
may also use RedistributeActiveDatabases.ps1, which is included in SP1, to automatically load-balance active database copies across DAG members.
Deciding the number and
location of database copies also involves the storage infrastructure
and the operational maturity of your IT department. Assuming the
operational challenges can be overcome, you should consider a few best
practices when choosing whether to use RAID (Redundant Array of
Independent Disks) or JBOD as summarized in Table 2.
Table 2. Choosing Between RAID and JBOD in a Single-Site Deployment
NUMBER OF COPIES | STORAGE OPTIONS |
---|
Two high availability | RAID |
Three or more high availability | RAID or JBOD |
One active and one lagged copy | RAID |
When a large number of
databases are hosted on each server in a DAG, disk management can
become complicated, especially when you are using JBOD storage. Only 23
drive letters are available to mount additional disk drives—A and B are
reserved and most likely the operating system is installed on C. When
planning a DAG that will require a number of volumes, it is a best
practice to use volume mount points rather than drive letters. Volume
mount points allow volumes to be mounted as directories rather than
drive letters. For example, you may want to mount a 1-TB volume in
D:\Databases\Dallas-MB01 to store the Dallas-MB01 database files. You
could then mount another 1-TB volume in C:\Databases\Dallas-MB-02 for
storing the Dallas-MB02 database files. This way you are no longer
constrained by the number of drive letters available.
Using mount points
introduces a problem: if the drive that contains the mount points
fails, you lose connectivity to all of the other drives. The best
practice is to protect the volume that contains the mount points using
RAID to reduce the likelihood of a single disk failure taking the
entire server offline.